Various types of Multi-Agent Reinforcement Learning (MARL) methods have been developed, assuming that agents' policies are based on true states. Recent works have improved the robustness of MARL under uncertainties from the reward, transition probability, or other partners' policies. However, in real-world multi-agent systems, state estimations may be perturbed by sensor measurement noise or even adversaries. Agents' policies trained with only true state information will deviate from optimal solutions when facing adversarial state perturbations during execution. MARL under adversarial state perturbations has limited study. Hence, in this work, we propose a State-Adversarial Markov Game (SAMG) and make the first attempt to study the fundamental properties of MARL under state uncertainties. We prove that the optimal agent policy and the robust Nash equilibrium do not always exist for an SAMG. Instead, we define the solution concept, robust agent policy, of the proposed SAMG under adversarial state perturbations, where agents want to maximize the worst-case expected state value. We then design a gradient descent ascent-based robust MARL algorithm to learn the robust policies for the MARL agents. Our experiments show that adversarial state perturbations decrease agents' rewards for several baselines from the existing literature, while our algorithm outperforms baselines with state perturbations and significantly improves the robustness of the MARL policies under state uncertainties.
translated by 谷歌翻译
电动汽车(EV)在自动启动的按需(AMOD)系统中起关键作用,但是它们的独特充电模式增加了AMOD系统中的模型不确定性(例如,状态过渡概率)。由于通常存在训练和测试(真)环境之间的不匹配,因此将模型不确定性纳入系统设计至关重要。但是,在现有文献重新平衡的EV AMOD系统中,尚未明确考虑模型不确定性,并且仍然是一项紧急和挑战的任务。在这项工作中,我们为EV重新平衡和充电问题设计了一个强大而有限的多机构增强学习(MARL)框架。然后,我们提出了一种强大且受限的MARL算法(Rocoma),该算法训练了强大的EV重新平衡政策,以平衡供需比率和整个城市的充电利用率在国家过渡不确定性下。实验表明,Rocoma可以学习有效且强大的重新平衡政策。当存在模型不确定性时,它的表现优于非稳定MAL方法。它使系统公平性增加了19.6%,并使重新平衡成本降低了75.8%。
translated by 谷歌翻译
在2D多板磁共振(MR)采集中,平面信号通常比面内信号较低。尽管当代超分辨率(SR)方法旨在恢复基本的高分辨率量,但估计的高频信息是通过端到端数据驱动的培训隐含的,而不是明确说明和寻求。为了解决这个问题,我们根据完美的重建过滤库重新构架SR问题声明,使我们能够识别并直接估计缺失的信息。在这项工作中,我们提出了一种两阶段的方法,以近似于与特定扫描的各向异性采集相对应的完美重建过滤库。在第1阶段,我们使用梯度下降估算缺失的过滤器,在第2阶段,我们使用深网来学习从粗系数到细节系数的映射。此外,提出的公式不依赖外部训练数据,从而规避了对域移位校正的需求。在我们的方法下,特别是在“切片差距”方案中提高了SR性能,这可能是由于框架施加的解决方案空间的限制。
translated by 谷歌翻译
大多数基于深度学习(DL)的可变形图像登记方法使用卷积神经网络(CNN)来估计移动和固定图像对的位移字段。但是,这要求CNN中的卷积内核不仅从输入中提取强度特征,而且还了解图像坐标系。我们认为,后者的任务对传统CNN来说是具有挑战性的,从而限制了他们在注册任务中的性能。为了解决此问题,我们首先介绍坐标翻译器,坐标转换器是一个可区分的模块,该模块识别固定和移动图像之间的匹配功能,并在不需要训练的情况下输出其坐标对应关系。它卸载了了解CNN的图像坐标系的负担,从而使它们可以专注于特征提取。然后,我们提出了一个新型的可变形注册网络IM2Grid,该网络使用多个坐标转换器与从CNN编码中提取的层次结构特征,并以粗略的方式输出变形字段。我们将IM2Grid与无监督的3D磁共振图像注册的最新DL和非DL方法进行了比较。我们的实验表明,IM2Grid在定性和定量上都优于这些方法。
translated by 谷歌翻译
鲁棒和准确的核心检测对于了解荧光显微镜图像中的生物结构是重要的。现有的自动核本地化方法面临三个主要挑战:(1)大多数物体检测方法仅在2D图像上工作,并且难以延伸到3D卷; (2)基于分段的模型可以在3D卷上使用,但对于大型显微镜卷是计算昂贵的,并且它们难以区分不同的物体实例; (3)手注释的地面真理限于3D显微镜体积。为了解决这些问题,我们提出了一种可扩展方法,用于3D显微镜卷的核质心检测。我们描述了RCNN-SliceNet以检测来自不同方向的每个体积的2D核质心,并且3D聚集等级聚类(AHC)用于估计体积中核的3D质心。使用空间约束的周期 - 一致的对冲网络(SPCyclegan)进行的合成显微镜数据接受培训,并在不同类型的真实3D显微镜数据上进行测试。广泛的实验结果表明,我们的提出方法可以准确地计数并检测3D显微镜体积中的核质心。
translated by 谷歌翻译
在标签 - 噪声学习中,估计过渡矩阵是一个热门话题,因为矩阵在构建统计上一致的分类器中起着重要作用。传统上,从干净的标签到嘈杂的标签(即,清洁标签过渡矩阵(CLTM))已被广泛利用,以通过使用嘈杂的数据来学习干净的标签分类器。该分类器的动机主要是输出贝叶斯的最佳预测标签,在本文中,我们研究以直接建模从贝叶斯最佳标签过渡到嘈杂标签(即贝叶斯标签,贝叶斯标签,是BLTM)),并学习分类器以预测贝叶斯最佳的分类器标签。请注意,只有嘈杂的数据,它不足以估计CLTM或BLTM。但是,贝叶斯最佳标签与干净标签相比,贝叶斯最佳标签的不确定性较小,即,贝叶斯最佳标签的类后代是一热矢量,而干净标签的载体则不是。这使两个优点能够估算BLTM,即(a)一组具有理论上保证的贝叶斯最佳标签的示例可以从嘈杂的数据中收集; (b)可行的解决方案空间要小得多。通过利用优势,我们通过采用深层神经网络来估计BLTM参数,从而更好地概括和出色的分类性能。
translated by 谷歌翻译
Masked image modeling (MIM) performs strongly in pre-training large vision Transformers (ViTs). However, small models that are critical for real-world applications cannot or only marginally benefit from this pre-training approach. In this paper, we explore distillation techniques to transfer the success of large MIM-based pre-trained models to smaller ones. We systematically study different options in the distillation framework, including distilling targets, losses, input, network regularization, sequential distillation, etc, revealing that: 1) Distilling token relations is more effective than CLS token- and feature-based distillation; 2) An intermediate layer of the teacher network as target perform better than that using the last layer when the depth of the student mismatches that of the teacher; 3) Weak regularization is preferred; etc. With these findings, we achieve significant fine-tuning accuracy improvements over the scratch MIM pre-training on ImageNet-1K classification, using all the ViT-Tiny, ViT-Small, and ViT-base models, with +4.2%/+2.4%/+1.4% gains, respectively. Our TinyMIM model of base size achieves 52.2 mIoU in AE20K semantic segmentation, which is +4.1 higher than the MAE baseline. Our TinyMIM model of tiny size achieves 79.6% top-1 accuracy on ImageNet-1K image classification, which sets a new record for small vision models of the same size and computation budget. This strong performance suggests an alternative way for developing small vision Transformer models, that is, by exploring better training methods rather than introducing inductive biases into architectures as in most previous works. Code is available at https://github.com/OliverRensu/TinyMIM.
translated by 谷歌翻译
Few Shot Instance Segmentation (FSIS) requires models to detect and segment novel classes with limited several support examples. In this work, we explore a simple yet unified solution for FSIS as well as its incremental variants, and introduce a new framework named Reference Twice (RefT) to fully explore the relationship between support/query features based on a Transformer-like framework. Our key insights are two folds: Firstly, with the aid of support masks, we can generate dynamic class centers more appropriately to re-weight query features. Secondly, we find that support object queries have already encoded key factors after base training. In this way, the query features can be enhanced twice from two aspects, i.e., feature-level and instance-level. In particular, we firstly design a mask-based dynamic weighting module to enhance support features and then propose to link object queries for better calibration via cross-attention. After the above steps, the novel classes can be improved significantly over our strong baseline. Additionally, our new framework can be easily extended to incremental FSIS with minor modification. When benchmarking results on the COCO dataset for FSIS, gFSIS, and iFSIS settings, our method achieves a competitive performance compared to existing approaches across different shots, e.g., we boost nAP by noticeable +8.2/+9.4 over the current state-of-the-art FSIS method for 10/30-shot. We further demonstrate the superiority of our approach on Few Shot Object Detection. Code and model will be available.
translated by 谷歌翻译
We present Muse, a text-to-image Transformer model that achieves state-of-the-art image generation performance while being significantly more efficient than diffusion or autoregressive models. Muse is trained on a masked modeling task in discrete token space: given the text embedding extracted from a pre-trained large language model (LLM), Muse is trained to predict randomly masked image tokens. Compared to pixel-space diffusion models, such as Imagen and DALL-E 2, Muse is significantly more efficient due to the use of discrete tokens and requiring fewer sampling iterations; compared to autoregressive models, such as Parti, Muse is more efficient due to the use of parallel decoding. The use of a pre-trained LLM enables fine-grained language understanding, translating to high-fidelity image generation and the understanding of visual concepts such as objects, their spatial relationships, pose, cardinality etc. Our 900M parameter model achieves a new SOTA on CC3M, with an FID score of 6.06. The Muse 3B parameter model achieves an FID of 7.88 on zero-shot COCO evaluation, along with a CLIP score of 0.32. Muse also directly enables a number of image editing applications without the need to fine-tune or invert the model: inpainting, outpainting, and mask-free editing. More results are available at https://muse-model.github.io
translated by 谷歌翻译
Learning the underlying distribution of molecular graphs and generating high-fidelity samples is a fundamental research problem in drug discovery and material science. However, accurately modeling distribution and rapidly generating novel molecular graphs remain crucial and challenging goals. To accomplish these goals, we propose a novel Conditional Diffusion model based on discrete Graph Structures (CDGS) for molecular graph generation. Specifically, we construct a forward graph diffusion process on both graph structures and inherent features through stochastic differential equations (SDE) and derive discrete graph structures as the condition for reverse generative processes. We present a specialized hybrid graph noise prediction model that extracts the global context and the local node-edge dependency from intermediate graph states. We further utilize ordinary differential equation (ODE) solvers for efficient graph sampling, based on the semi-linear structure of the probability flow ODE. Experiments on diverse datasets validate the effectiveness of our framework. Particularly, the proposed method still generates high-quality molecular graphs in a limited number of steps.
translated by 谷歌翻译